Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 420 | 424 |
| Missing cells (%) | 7.8% | 7.9% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Fare is highly overall correlated with Pclass | Alert not present in this dataset | High Correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Pclass is highly overall correlated with Fare | Alert not present in this dataset | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 82 (18.4%) missing values | Age has 81 (18.2%) missing values | Missing |
Cabin has 337 (75.6%) missing values | Cabin has 342 (76.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 292 (65.5%) zeros | SibSp has 303 (67.9%) zeros | Zeros |
Parch has 338 (75.8%) zeros | Parch has 332 (74.4%) zeros | Zeros |
Fare has 9 (2.0%) zeros | Fare has 9 (2.0%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-10-24 11:38:52.006851 | 2023-10-24 11:38:59.504695 |
| Analysis finished | 2023-10-24 11:38:59.502369 | 2023-10-24 11:39:06.316978 |
| Duration | 7.5 seconds | 6.81 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 445.52018 | 438.15471 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| Maximum | 891 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 1 |
| 5-th percentile | 37 | 48 |
| Q1 | 219 | 216.25 |
| median | 448 | 419 |
| Q3 | 663.75 | 662.25 |
| 95-th percentile | 845.75 | 851.75 |
| Maximum | 891 | 891 |
| Range | 890 | 890 |
| Interquartile range (IQR) | 444.75 | 446 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 258.20118 | 258.77 |
| Coefficient of variation (CV) | 0.57954991 | 0.59059047 |
| Kurtosis | -1.1891536 | -1.1980558 |
| Mean | 445.52018 | 438.15471 |
| Median Absolute Deviation (MAD) | 221.5 | 218.5 |
| Skewness | -0.0058509045 | 0.071107097 |
| Sum | 198702 | 195417 |
| Variance | 66667.85 | 66961.911 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 797 | 1 | 0.2% |
| 109 | 1 | 0.2% |
| 197 | 1 | 0.2% |
| 244 | 1 | 0.2% |
| 674 | 1 | 0.2% |
| 523 | 1 | 0.2% |
| 605 | 1 | 0.2% |
| 74 | 1 | 0.2% |
| 649 | 1 | 0.2% |
| 248 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 672 | 1 | 0.2% |
| 98 | 1 | 0.2% |
| 633 | 1 | 0.2% |
| 652 | 1 | 0.2% |
| 311 | 1 | 0.2% |
| 862 | 1 | 0.2% |
| 80 | 1 | 0.2% |
| 226 | 1 | 0.2% |
| 343 | 1 | 0.2% |
| 254 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 13 | 1 | |
| 14 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 17 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 17 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 13 | 1 | |
| 14 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 0 |
| 2nd row | 1 | 1 |
| 3rd row | 1 | 0 |
| 4th row | 0 | 1 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 269 | |
| 1 | 177 |
| Value | Count | Frequency (%) |
| 0 | 285 | |
| 1 | 161 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 1 |
| 2nd row | 3 | 2 |
| 3rd row | 1 | 2 |
| 4th row | 2 | 1 |
| 5th row | 2 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 116 | |
| 2 | 90 | 20.2% |
| Value | Count | Frequency (%) |
| 3 | 236 | |
| 1 | 110 | |
| 2 | 100 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 51 | 50 |
| Mean length | 27.192825 | 26.829596 |
| Min length | 13 | 13 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 12128 | 11966 |
| Distinct characters | 59 | 60 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Leader, Dr. Alice (Farnham) | Davidson, Mr. Thornton |
| 2nd row | Connolly, Miss. Kate | Richards, Master. William Rowe |
| 3rd row | Dick, Mr. Albert Adrian | Nicholls, Mr. Joseph Charles |
| 4th row | Pernot, Mr. Rene | Silvey, Mrs. William Baird (Alice Munger) |
| 5th row | Collander, Mr. Erik Gustaf | Asplund, Master. Edvin Rojj Felix |
| Value | Count | Frequency (%) |
| mr | 266 | 14.5% |
| miss | 84 | 4.6% |
| mrs | 72 | 3.9% |
| william | 36 | 2.0% |
| john | 21 | 1.1% |
| master | 17 | 0.9% |
| henry | 16 | 0.9% |
| george | 14 | 0.8% |
| charles | 12 | 0.7% |
| mary | 12 | 0.7% |
| Other values (887) | 1283 |
| Value | Count | Frequency (%) |
| mr | 260 | 14.4% |
| miss | 92 | 5.1% |
| mrs | 61 | 3.4% |
| william | 32 | 1.8% |
| john | 21 | 1.2% |
| master | 20 | 1.1% |
| henry | 14 | 0.8% |
| charles | 14 | 0.8% |
| thomas | 13 | 0.7% |
| george | 13 | 0.7% |
| Other values (872) | 1261 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1388 | 11.4% | |
| r | 989 | 8.2% |
| e | 861 | 7.1% |
| a | 813 | 6.7% |
| n | 661 | 5.5% |
| i | 660 | 5.4% |
| s | 635 | 5.2% |
| M | 574 | 4.7% |
| l | 538 | 4.4% |
| o | 508 | 4.2% |
| Other values (49) | 4501 |
| Value | Count | Frequency (%) |
| 1356 | 11.3% | |
| r | 965 | 8.1% |
| e | 876 | 7.3% |
| a | 834 | 7.0% |
| n | 662 | 5.5% |
| s | 654 | 5.5% |
| i | 644 | 5.4% |
| M | 550 | 4.6% |
| l | 517 | 4.3% |
| o | 502 | 4.2% |
| Other values (50) | 4406 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7788 | |
| Uppercase Letter | 1840 | 15.2% |
| Space Separator | 1388 | 11.4% |
| Other Punctuation | 945 | 7.8% |
| Close Punctuation | 81 | 0.7% |
| Open Punctuation | 81 | 0.7% |
| Dash Punctuation | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7714 | |
| Uppercase Letter | 1806 | 15.1% |
| Space Separator | 1356 | 11.3% |
| Other Punctuation | 943 | 7.9% |
| Open Punctuation | 70 | 0.6% |
| Close Punctuation | 70 | 0.6% |
| Dash Punctuation | 7 | 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1388 |
| Value | Count | Frequency (%) |
| 1356 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 989 | |
| e | 861 | |
| a | 813 | |
| n | 661 | |
| i | 660 | |
| s | 635 | |
| l | 538 | 6.9% |
| o | 508 | 6.5% |
| t | 323 | 4.1% |
| h | 269 | 3.5% |
| Other values (16) | 1531 |
| Value | Count | Frequency (%) |
| r | 965 | |
| e | 876 | |
| a | 834 | |
| n | 662 | |
| s | 654 | |
| i | 644 | |
| l | 517 | 6.7% |
| o | 502 | 6.5% |
| t | 323 | 4.2% |
| d | 253 | 3.3% |
| Other values (16) | 1484 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 574 | |
| A | 130 | 7.1% |
| J | 119 | 6.5% |
| H | 97 | 5.3% |
| C | 89 | 4.8% |
| S | 87 | 4.7% |
| E | 80 | 4.3% |
| W | 79 | 4.3% |
| B | 69 | 3.8% |
| L | 63 | 3.4% |
| Other values (15) | 453 |
| Value | Count | Frequency (%) |
| M | 550 | |
| A | 122 | 6.8% |
| H | 102 | 5.6% |
| J | 102 | 5.6% |
| C | 83 | 4.6% |
| S | 82 | 4.5% |
| E | 78 | 4.3% |
| B | 75 | 4.2% |
| W | 67 | 3.7% |
| G | 65 | 3.6% |
| Other values (15) | 480 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 50 | 5.3% |
| ' | 3 | 0.3% |
| Value | Count | Frequency (%) |
| . | 446 | |
| , | 446 | |
| " | 46 | 4.9% |
| ' | 4 | 0.4% |
| / | 1 | 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 81 |
| Value | Count | Frequency (%) |
| ) | 70 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 81 |
| Value | Count | Frequency (%) |
| ( | 70 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5 |
| Value | Count | Frequency (%) |
| - | 7 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9628 | |
| Common | 2500 | 20.6% |
| Value | Count | Frequency (%) |
| Latin | 9520 | |
| Common | 2446 | 20.4% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1388 | ||
| . | 446 | 17.8% |
| , | 446 | 17.8% |
| ) | 81 | 3.2% |
| ( | 81 | 3.2% |
| " | 50 | 2.0% |
| - | 5 | 0.2% |
| ' | 3 | 0.1% |
| Value | Count | Frequency (%) |
| 1356 | ||
| . | 446 | 18.2% |
| , | 446 | 18.2% |
| ( | 70 | 2.9% |
| ) | 70 | 2.9% |
| " | 46 | 1.9% |
| - | 7 | 0.3% |
| ' | 4 | 0.2% |
| / | 1 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 989 | 10.3% |
| e | 861 | 8.9% |
| a | 813 | 8.4% |
| n | 661 | 6.9% |
| i | 660 | 6.9% |
| s | 635 | 6.6% |
| M | 574 | 6.0% |
| l | 538 | 5.6% |
| o | 508 | 5.3% |
| t | 323 | 3.4% |
| Other values (41) | 3066 |
| Value | Count | Frequency (%) |
| r | 965 | 10.1% |
| e | 876 | 9.2% |
| a | 834 | 8.8% |
| n | 662 | 7.0% |
| s | 654 | 6.9% |
| i | 644 | 6.8% |
| M | 550 | 5.8% |
| l | 517 | 5.4% |
| o | 502 | 5.3% |
| t | 323 | 3.4% |
| Other values (41) | 2993 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12128 |
| Value | Count | Frequency (%) |
| ASCII | 11966 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1388 | 11.4% | |
| r | 989 | 8.2% |
| e | 861 | 7.1% |
| a | 813 | 6.7% |
| n | 661 | 5.5% |
| i | 660 | 5.4% |
| s | 635 | 5.2% |
| M | 574 | 4.7% |
| l | 538 | 4.4% |
| o | 508 | 4.2% |
| Other values (49) | 4501 |
| Value | Count | Frequency (%) |
| 1356 | 11.3% | |
| r | 965 | 8.1% |
| e | 876 | 7.3% |
| a | 834 | 7.0% |
| n | 662 | 5.5% |
| s | 654 | 5.5% |
| i | 644 | 5.4% |
| M | 550 | 4.6% |
| l | 517 | 4.3% |
| o | 502 | 4.2% |
| Other values (50) | 4406 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7085202 | 4.6950673 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2100 | 2094 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | male |
| 2nd row | female | male |
| 3rd row | male | male |
| 4th row | male | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
| Value | Count | Frequency (%) |
| male | 291 | |
| female | 155 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2100 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2094 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2100 |
| Value | Count | Frequency (%) |
| Latin | 2094 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2100 |
| Value | Count | Frequency (%) |
| ASCII | 2094 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 601 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 155 | 7.4% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 76 | 72 |
| Distinct (%) | 20.9% | 19.7% |
| Missing | 82 | 81 |
| Missing (%) | 18.4% | 18.2% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.596621 | 29.834712 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.42 |
| Maximum | 80 | 74 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.67 | 0.42 |
| 5-th percentile | 4.15 | 5.2 |
| Q1 | 21 | 21 |
| median | 29 | 29 |
| Q3 | 39 | 38 |
| 95-th percentile | 59.85 | 54.8 |
| Maximum | 80 | 74 |
| Range | 79.33 | 73.58 |
| Interquartile range (IQR) | 18 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.606008 | 13.679396 |
| Coefficient of variation (CV) | 0.47737325 | 0.45850604 |
| Kurtosis | 0.32745385 | 0.053951798 |
| Mean | 30.596621 | 29.834712 |
| Median Absolute Deviation (MAD) | 9 | 8 |
| Skewness | 0.44700022 | 0.21634065 |
| Sum | 11137.17 | 10889.67 |
| Variance | 213.33548 | 187.12587 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 22 | 15 | 3.4% |
| 28 | 14 | 3.1% |
| 36 | 13 | 2.9% |
| 27 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 31 | 12 | 2.7% |
| 19 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 18 | 11 | 2.5% |
| 26 | 10 | 2.2% |
| Other values (66) | 239 | |
| (Missing) | 82 | 18.4% |
| Value | Count | Frequency (%) |
| 30 | 16 | 3.6% |
| 36 | 14 | 3.1% |
| 24 | 13 | 2.9% |
| 31 | 12 | 2.7% |
| 28 | 12 | 2.7% |
| 21 | 12 | 2.7% |
| 32 | 11 | 2.5% |
| 22 | 11 | 2.5% |
| 25 | 11 | 2.5% |
| 23 | 10 | 2.2% |
| Other values (62) | 243 | |
| (Missing) | 81 | 18.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 4 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 3 | |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 4 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 3 | |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 3 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 1 | 0.2% |
| 7 | 2 | 0.4% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.52914798 | 0.51793722 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 292 | 303 |
| Zeros (%) | 65.5% | 67.9% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0505441 | 1.04651 |
| Coefficient of variation (CV) | 1.9853503 | 2.0205344 |
| Kurtosis | 19.208665 | 15.082131 |
| Mean | 0.52914798 | 0.51793722 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.7425595 | 3.3556234 |
| Sum | 236 | 231 |
| Variance | 1.1036429 | 1.0951832 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 292 | |
| 1 | 119 | |
| 2 | 16 | 3.6% |
| 4 | 7 | 1.6% |
| 3 | 6 | 1.3% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 292 | |
| 1 | 119 | |
| 2 | 16 | 3.6% |
| 3 | 6 | 1.3% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 105 | 23.5% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 4 | 0.9% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 292 | |
| 1 | 119 | |
| 2 | 16 | 3.6% |
| 3 | 6 | 1.3% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.38789238 | 0.41928251 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 338 | 332 |
| Zeros (%) | 75.8% | 74.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.83450129 | 0.84878521 |
| Coefficient of variation (CV) | 2.1513733 | 2.0243754 |
| Kurtosis | 11.488517 | 9.0163041 |
| Mean | 0.38789238 | 0.41928251 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.9689634 | 2.6241397 |
| Sum | 173 | 187 |
| Variance | 0.6963924 | 0.72043634 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 63 | 14.1% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 59 | 13.2% |
| 2 | 47 | 10.5% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 63 | 14.1% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 59 | 13.2% |
| 2 | 47 | 10.5% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 59 | 13.2% |
| 2 | 47 | 10.5% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 63 | 14.1% |
| 2 | 36 | 8.1% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 381 | 377 |
| Distinct (%) | 85.4% | 84.5% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6860987 | 6.7713004 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2982 | 3020 |
| Distinct characters | 35 | 32 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 329 | 325 ? |
| Unique (%) | 73.8% | 72.9% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 17465 | F.C. 12750 |
| 2nd row | 370373 | 29106 |
| 3rd row | 17474 | C.A. 33112 |
| 4th row | SC/PARIS 2131 | 13507 |
| 5th row | 248740 | 347077 |
| Value | Count | Frequency (%) |
| pc | 33 | 5.8% |
| c.a | 11 | 1.9% |
| a/5 | 9 | 1.6% |
| ston/o | 8 | 1.4% |
| 2 | 8 | 1.4% |
| ca | 7 | 1.2% |
| w./c | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| line | 4 | 0.7% |
| 2144 | 4 | 0.7% |
| Other values (398) | 471 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| ca | 7 | 1.2% |
| c.a | 7 | 1.2% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| w./c | 6 | 1.1% |
| ston/o2 | 5 | 0.9% |
| a/5 | 5 | 0.9% |
| 2144 | 5 | 0.9% |
| soton/o.q | 5 | 0.9% |
| Other values (401) | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 334 | |
| 2 | 280 | |
| 7 | 256 | |
| 4 | 235 | 7.9% |
| 6 | 208 | 7.0% |
| 0 | 207 | 6.9% |
| 5 | 201 | 6.7% |
| 9 | 157 | 5.3% |
| 8 | 137 | 4.6% |
| Other values (25) | 588 |
| Value | Count | Frequency (%) |
| 3 | 356 | |
| 1 | 349 | |
| 2 | 302 | |
| 7 | 245 | |
| 4 | 241 | 8.0% |
| 6 | 216 | 7.2% |
| 5 | 196 | 6.5% |
| 0 | 193 | 6.4% |
| 9 | 170 | 5.6% |
| 8 | 145 | 4.8% |
| Other values (22) | 607 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2394 | |
| Uppercase Letter | 320 | 10.7% |
| Other Punctuation | 136 | 4.6% |
| Space Separator | 119 | 4.0% |
| Lowercase Letter | 13 | 0.4% |
| Value | Count | Frequency (%) |
| Decimal Number | 2413 | |
| Uppercase Letter | 329 | 10.9% |
| Other Punctuation | 155 | 5.1% |
| Space Separator | 115 | 3.8% |
| Lowercase Letter | 8 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 334 | |
| 2 | 280 | |
| 7 | 256 | |
| 4 | 235 | |
| 6 | 208 | |
| 0 | 207 | |
| 5 | 201 | |
| 9 | 157 | |
| 8 | 137 | 5.7% |
| Value | Count | Frequency (%) |
| 3 | 356 | |
| 1 | 349 | |
| 2 | 302 | |
| 7 | 245 | |
| 4 | 241 | |
| 6 | 216 | |
| 5 | 196 | |
| 0 | 193 | |
| 9 | 170 | |
| 8 | 145 |
Space Separator
| Value | Count | Frequency (%) |
| 119 |
| Value | Count | Frequency (%) |
| 115 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 88 | |
| / | 48 |
| Value | Count | Frequency (%) |
| . | 104 | |
| / | 51 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 76 | |
| P | 50 | |
| A | 41 | |
| O | 40 | |
| S | 36 | |
| N | 19 | 5.9% |
| T | 15 | 4.7% |
| I | 9 | 2.8% |
| W | 7 | 2.2% |
| Q | 6 | 1.9% |
| Other values (6) | 21 | 6.6% |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 55 | |
| P | 47 | |
| S | 43 | |
| A | 34 | |
| N | 22 | 6.7% |
| T | 20 | 6.1% |
| W | 10 | 3.0% |
| I | 6 | 1.8% |
| Q | 6 | 1.8% |
| Other values (5) | 14 | 4.3% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 3 | |
| r | 2 | |
| i | 2 | |
| l | 1 | 7.7% |
| e | 1 | 7.7% |
| Value | Count | Frequency (%) |
| a | 2 | |
| r | 2 | |
| i | 2 | |
| s | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2649 | |
| Latin | 333 | 11.2% |
| Value | Count | Frequency (%) |
| Common | 2683 | |
| Latin | 337 | 11.2% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 334 | |
| 2 | 280 | |
| 7 | 256 | |
| 4 | 235 | |
| 6 | 208 | |
| 0 | 207 | |
| 5 | 201 | |
| 9 | 157 | |
| 8 | 137 | 5.2% |
| Other values (3) | 255 |
| Value | Count | Frequency (%) |
| 3 | 356 | |
| 1 | 349 | |
| 2 | 302 | |
| 7 | 245 | |
| 4 | 241 | |
| 6 | 216 | |
| 5 | 196 | |
| 0 | 193 | |
| 9 | 170 | |
| 8 | 145 | |
| Other values (3) | 270 |
Latin
| Value | Count | Frequency (%) |
| C | 76 | |
| P | 50 | |
| A | 41 | |
| O | 40 | |
| S | 36 | |
| N | 19 | 5.7% |
| T | 15 | 4.5% |
| I | 9 | 2.7% |
| W | 7 | 2.1% |
| Q | 6 | 1.8% |
| Other values (12) | 34 |
| Value | Count | Frequency (%) |
| C | 72 | |
| O | 55 | |
| P | 47 | |
| S | 43 | |
| A | 34 | |
| N | 22 | 6.5% |
| T | 20 | 5.9% |
| W | 10 | 3.0% |
| I | 6 | 1.8% |
| Q | 6 | 1.8% |
| Other values (9) | 22 | 6.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2982 |
| Value | Count | Frequency (%) |
| ASCII | 3020 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 379 | |
| 1 | 334 | |
| 2 | 280 | |
| 7 | 256 | |
| 4 | 235 | 7.9% |
| 6 | 208 | 7.0% |
| 0 | 207 | 6.9% |
| 5 | 201 | 6.7% |
| 9 | 157 | 5.3% |
| 8 | 137 | 4.6% |
| Other values (25) | 588 |
| Value | Count | Frequency (%) |
| 3 | 356 | |
| 1 | 349 | |
| 2 | 302 | |
| 7 | 245 | |
| 4 | 241 | 8.0% |
| 6 | 216 | 7.2% |
| 5 | 196 | 6.5% |
| 0 | 193 | 6.4% |
| 9 | 170 | 5.6% |
| 8 | 145 | 4.8% |
| Other values (22) | 607 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 183 | 177 |
| Distinct (%) | 41.0% | 39.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 33.987443 | 33.034996 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 9 | 9 |
| Zeros (%) | 2.0% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.162525 |
| Q1 | 7.925 | 7.95625 |
| median | 14.5 | 14.47915 |
| Q3 | 31.3875 | 30.5 |
| 95-th percentile | 130.2375 | 113.275 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.4625 | 22.54375 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 50.540167 | 53.440603 |
| Coefficient of variation (CV) | 1.4870247 | 1.6176967 |
| Kurtosis | 23.254427 | 32.793948 |
| Mean | 33.987443 | 33.034996 |
| Median Absolute Deviation (MAD) | 7.21875 | 7.10625 |
| Skewness | 3.9724626 | 4.8980006 |
| Sum | 15158.4 | 14733.608 |
| Variance | 2554.3084 | 2855.898 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 8.05 | 22 | 4.9% |
| 7.75 | 19 | 4.3% |
| 13 | 19 | 4.3% |
| 26 | 18 | 4.0% |
| 7.8958 | 17 | 3.8% |
| 10.5 | 11 | 2.5% |
| 7.925 | 10 | 2.2% |
| 8.6625 | 9 | 2.0% |
| 0 | 9 | 2.0% |
| 7.775 | 9 | 2.0% |
| Other values (173) | 303 |
| Value | Count | Frequency (%) |
| 13 | 29 | 6.5% |
| 8.05 | 22 | 4.9% |
| 7.8958 | 18 | 4.0% |
| 7.75 | 16 | 3.6% |
| 26 | 16 | 3.6% |
| 7.925 | 10 | 2.2% |
| 7.775 | 9 | 2.0% |
| 0 | 9 | 2.0% |
| 26.55 | 7 | 1.6% |
| 10.5 | 7 | 1.6% |
| Other values (167) | 303 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 91 | 89 |
| Distinct (%) | 83.5% | 85.6% |
| Missing | 337 | 342 |
| Missing (%) | 75.6% | 76.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.5963303 | 3.8365385 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 392 | 399 |
| Distinct characters | 19 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 75 | 76 ? |
| Unique (%) | 68.8% | 73.1% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | D17 | B71 |
| 2nd row | B20 | E44 |
| 3rd row | D48 | C23 C25 C27 |
| 4th row | F33 | B77 |
| 5th row | E46 | F E69 |
| Value | Count | Frequency (%) |
| c23 | 3 | 2.3% |
| c27 | 3 | 2.3% |
| d | 3 | 2.3% |
| c25 | 3 | 2.3% |
| e24 | 2 | 1.6% |
| c65 | 2 | 1.6% |
| c68 | 2 | 1.6% |
| c2 | 2 | 1.6% |
| b55 | 2 | 1.6% |
| b53 | 2 | 1.6% |
| Other values (93) | 105 |
| Value | Count | Frequency (%) |
| c23 | 4 | 3.1% |
| c27 | 4 | 3.1% |
| c25 | 4 | 3.1% |
| d | 2 | 1.6% |
| f | 2 | 1.6% |
| c124 | 2 | 1.6% |
| b77 | 2 | 1.6% |
| b55 | 2 | 1.6% |
| b53 | 2 | 1.6% |
| b18 | 2 | 1.6% |
| Other values (91) | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 38 | 9.7% |
| C | 37 | 9.4% |
| 3 | 34 | 8.7% |
| B | 34 | 8.7% |
| 1 | 33 | 8.4% |
| 5 | 28 | 7.1% |
| 6 | 26 | 6.6% |
| E | 23 | 5.9% |
| 4 | 20 | 5.1% |
| 20 | 5.1% | |
| Other values (9) | 99 |
| Value | Count | Frequency (%) |
| C | 42 | 10.5% |
| 2 | 36 | 9.0% |
| 1 | 34 | 8.5% |
| B | 34 | 8.5% |
| 5 | 32 | 8.0% |
| 3 | 27 | 6.8% |
| 6 | 24 | 6.0% |
| 23 | 5.8% | |
| 4 | 22 | 5.5% |
| 7 | 20 | 5.0% |
| Other values (8) | 105 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 243 | |
| Uppercase Letter | 129 | |
| Space Separator | 20 | 5.1% |
| Value | Count | Frequency (%) |
| Decimal Number | 249 | |
| Uppercase Letter | 127 | |
| Space Separator | 23 | 5.8% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 38 | |
| 3 | 34 | |
| 1 | 33 | |
| 5 | 28 | |
| 6 | 26 | |
| 4 | 20 | |
| 8 | 20 | |
| 9 | 15 | 6.2% |
| 7 | 15 | 6.2% |
| 0 | 14 | 5.8% |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| 1 | 34 | |
| 5 | 32 | |
| 3 | 27 | |
| 6 | 24 | |
| 4 | 22 | |
| 7 | 20 | |
| 8 | 19 | |
| 0 | 19 | |
| 9 | 16 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 37 | |
| B | 34 | |
| E | 23 | |
| D | 17 | |
| A | 10 | 7.8% |
| F | 5 | 3.9% |
| G | 2 | 1.6% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 42 | |
| B | 34 | |
| D | 20 | |
| E | 17 | |
| A | 6 | 4.7% |
| F | 5 | 3.9% |
| G | 3 | 2.4% |
Space Separator
| Value | Count | Frequency (%) |
| 20 |
| Value | Count | Frequency (%) |
| 23 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 263 | |
| Latin | 129 |
| Value | Count | Frequency (%) |
| Common | 272 | |
| Latin | 127 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 38 | |
| 3 | 34 | |
| 1 | 33 | |
| 5 | 28 | |
| 6 | 26 | |
| 4 | 20 | |
| 20 | ||
| 8 | 20 | |
| 9 | 15 | 5.7% |
| 7 | 15 | 5.7% |
| Value | Count | Frequency (%) |
| 2 | 36 | |
| 1 | 34 | |
| 5 | 32 | |
| 3 | 27 | |
| 6 | 24 | |
| 23 | ||
| 4 | 22 | |
| 7 | 20 | |
| 8 | 19 | |
| 0 | 19 |
Latin
| Value | Count | Frequency (%) |
| C | 37 | |
| B | 34 | |
| E | 23 | |
| D | 17 | |
| A | 10 | 7.8% |
| F | 5 | 3.9% |
| G | 2 | 1.6% |
| T | 1 | 0.8% |
| Value | Count | Frequency (%) |
| C | 42 | |
| B | 34 | |
| D | 20 | |
| E | 17 | |
| A | 6 | 4.7% |
| F | 5 | 3.9% |
| G | 3 | 2.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 392 |
| Value | Count | Frequency (%) |
| ASCII | 399 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 38 | 9.7% |
| C | 37 | 9.4% |
| 3 | 34 | 8.7% |
| B | 34 | 8.7% |
| 1 | 33 | 8.4% |
| 5 | 28 | 7.1% |
| 6 | 26 | 6.6% |
| E | 23 | 5.9% |
| 4 | 20 | 5.1% |
| 20 | 5.1% | |
| Other values (9) | 99 |
| Value | Count | Frequency (%) |
| C | 42 | 10.5% |
| 2 | 36 | 9.0% |
| 1 | 34 | 8.5% |
| B | 34 | 8.5% |
| 5 | 32 | 8.0% |
| 3 | 27 | 6.8% |
| 6 | 24 | 6.0% |
| 23 | 5.8% | |
| 4 | 22 | 5.5% |
| 7 | 20 | 5.0% |
| Other values (8) | 105 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 445 | 445 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | Q | S |
| 3rd row | S | S |
| 4th row | C | S |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 320 | |
| C | 83 | 18.6% |
| Q | 42 | 9.4% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 93 | 20.9% |
| Q | 36 | 8.1% |
| (Missing) | 1 | 0.2% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 320 | |
| c | 83 | 18.7% |
| q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| s | 316 | |
| c | 93 | 20.9% |
| q | 36 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 320 | |
| C | 83 | 18.7% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 93 | 20.9% |
| Q | 36 | 8.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 445 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 320 | |
| C | 83 | 18.7% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 93 | 20.9% |
| Q | 36 | 8.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 445 |
| Value | Count | Frequency (%) |
| Latin | 445 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 320 | |
| C | 83 | 18.7% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 93 | 20.9% |
| Q | 36 | 8.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 445 |
| Value | Count | Frequency (%) |
| ASCII | 445 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 320 | |
| C | 83 | 18.7% |
| Q | 42 | 9.4% |
| Value | Count | Frequency (%) |
| S | 316 | |
| C | 93 | 20.9% |
| Q | 36 | 8.1% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | -0.004 | -0.078 | 0.067 | 0.016 | 0.179 | 0.061 | 0.078 | 0.000 |
| Age | -0.004 | 1.000 | -0.113 | -0.145 | 0.208 | 0.146 | 0.267 | 0.085 | 0.112 |
| SibSp | -0.078 | -0.113 | 1.000 | 0.428 | 0.450 | 0.173 | 0.131 | 0.221 | 0.059 |
| Parch | 0.067 | -0.145 | 0.428 | 1.000 | 0.433 | 0.128 | 0.043 | 0.218 | 0.046 |
| Fare | 0.016 | 0.208 | 0.450 | 0.433 | 1.000 | 0.301 | 0.504 | 0.229 | 0.241 |
| Survived | 0.179 | 0.146 | 0.173 | 0.128 | 0.301 | 1.000 | 0.382 | 0.620 | 0.141 |
| Pclass | 0.061 | 0.267 | 0.131 | 0.043 | 0.504 | 0.382 | 1.000 | 0.176 | 0.276 |
| Sex | 0.078 | 0.085 | 0.221 | 0.218 | 0.229 | 0.620 | 0.176 | 1.000 | 0.199 |
| Embarked | 0.000 | 0.112 | 0.059 | 0.046 | 0.241 | 0.141 | 0.276 | 0.199 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Embarked | |
|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.095 | -0.071 | 0.037 | -0.022 | 0.000 | 0.065 | 0.000 | 0.000 |
| Age | 0.095 | 1.000 | -0.177 | -0.268 | 0.095 | 0.075 | 0.236 | 0.077 | 0.000 |
| SibSp | -0.071 | -0.177 | 1.000 | 0.415 | 0.457 | 0.049 | 0.107 | 0.158 | 0.092 |
| Parch | 0.037 | -0.268 | 0.415 | 1.000 | 0.422 | 0.058 | 0.000 | 0.293 | 0.076 |
| Fare | -0.022 | 0.095 | 0.457 | 0.422 | 1.000 | 0.284 | 0.462 | 0.256 | 0.222 |
| Survived | 0.000 | 0.075 | 0.049 | 0.058 | 0.284 | 1.000 | 0.361 | 0.514 | 0.227 |
| Pclass | 0.065 | 0.236 | 0.107 | 0.000 | 0.462 | 0.361 | 1.000 | 0.119 | 0.266 |
| Sex | 0.000 | 0.077 | 0.158 | 0.293 | 0.256 | 0.514 | 0.119 | 1.000 | 0.137 |
| Embarked | 0.000 | 0.000 | 0.092 | 0.076 | 0.222 | 0.227 | 0.266 | 0.137 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 796 | 797 | 1 | 1 | Leader, Dr. Alice (Farnham) | female | 49.0 | 0 | 0 | 17465 | 25.9292 | D17 | S |
| 289 | 290 | 1 | 3 | Connolly, Miss. Kate | female | 22.0 | 0 | 0 | 370373 | 7.7500 | NaN | Q |
| 690 | 691 | 1 | 1 | Dick, Mr. Albert Adrian | male | 31.0 | 1 | 0 | 17474 | 57.0000 | B20 | S |
| 181 | 182 | 0 | 2 | Pernot, Mr. Rene | male | NaN | 0 | 0 | SC/PARIS 2131 | 15.0500 | NaN | C |
| 342 | 343 | 0 | 2 | Collander, Mr. Erik Gustaf | male | 28.0 | 0 | 0 | 248740 | 13.0000 | NaN | S |
| 376 | 377 | 1 | 3 | Landergren, Miss. Aurora Adelia | female | 22.0 | 0 | 0 | C 7077 | 7.2500 | NaN | S |
| 313 | 314 | 0 | 3 | Hendekovic, Mr. Ignjac | male | 28.0 | 0 | 0 | 349243 | 7.8958 | NaN | S |
| 659 | 660 | 0 | 1 | Newell, Mr. Arthur Webster | male | 58.0 | 0 | 2 | 35273 | 113.2750 | D48 | C |
| 516 | 517 | 1 | 2 | Lemore, Mrs. (Amelia Milley) | female | 34.0 | 0 | 0 | C.A. 34260 | 10.5000 | F33 | S |
| 6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 671 | 672 | 0 | 1 | Davidson, Mr. Thornton | male | 31.0 | 1 | 0 | F.C. 12750 | 52.0000 | B71 | S |
| 407 | 408 | 1 | 2 | Richards, Master. William Rowe | male | 3.0 | 1 | 1 | 29106 | 18.7500 | NaN | S |
| 145 | 146 | 0 | 2 | Nicholls, Mr. Joseph Charles | male | 19.0 | 1 | 1 | C.A. 33112 | 36.7500 | NaN | S |
| 577 | 578 | 1 | 1 | Silvey, Mrs. William Baird (Alice Munger) | female | 39.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
| 261 | 262 | 1 | 3 | Asplund, Master. Edvin Rojj Felix | male | 3.0 | 4 | 2 | 347077 | 31.3875 | NaN | S |
| 428 | 429 | 0 | 3 | Flynn, Mr. James | male | NaN | 0 | 0 | 364851 | 7.7500 | NaN | Q |
| 727 | 728 | 1 | 3 | Mannion, Miss. Margareth | female | NaN | 0 | 0 | 36866 | 7.7375 | NaN | Q |
| 438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
| 257 | 258 | 1 | 1 | Cherry, Miss. Gladys | female | 30.0 | 0 | 0 | 110152 | 86.5000 | B77 | S |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 98 | 99 | 1 | 2 | Doling, Mrs. John T (Ada Julia Bone) | female | 34.0 | 0 | 1 | 231919 | 23.0000 | NaN | S |
| 666 | 667 | 0 | 2 | Butler, Mr. Reginald Fenton | male | 25.0 | 0 | 0 | 234686 | 13.0000 | NaN | S |
| 593 | 594 | 0 | 3 | Bourke, Miss. Mary | female | NaN | 0 | 2 | 364848 | 7.7500 | NaN | Q |
| 812 | 813 | 0 | 2 | Slemen, Mr. Richard James | male | 35.0 | 0 | 0 | 28206 | 10.5000 | NaN | S |
| 323 | 324 | 1 | 2 | Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh) | female | 22.0 | 1 | 1 | 248738 | 29.0000 | NaN | S |
| 425 | 426 | 0 | 3 | Wiseman, Mr. Phillippe | male | NaN | 0 | 0 | A/4. 34244 | 7.2500 | NaN | S |
| 212 | 213 | 0 | 3 | Perkin, Mr. John Henry | male | 22.0 | 0 | 0 | A/5 21174 | 7.2500 | NaN | S |
| 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 635 | 636 | 1 | 2 | Davis, Miss. Mary | female | 28.0 | 0 | 0 | 237668 | 13.0000 | NaN | S |
| 116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 | 0 | 0 | 370369 | 7.7500 | NaN | Q |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 13 | 0 | 3 | Saundercock, Mr. William Henry | male | 20.0 | 0 | 0 | A/5. 2151 | 8.0500 | NaN | S |
| 754 | 755 | 1 | 2 | Herman, Mrs. Samuel (Jane Laver) | female | 48.0 | 1 | 2 | 220845 | 65.0000 | NaN | S |
| 2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
| 168 | 169 | 0 | 1 | Baumann, Mr. John D | male | NaN | 0 | 0 | PC 17318 | 25.9250 | NaN | S |
| 877 | 878 | 0 | 3 | Petroff, Mr. Nedelio | male | 19.0 | 0 | 0 | 349212 | 7.8958 | NaN | S |
| 346 | 347 | 1 | 2 | Smith, Miss. Marion Elsie | female | 40.0 | 0 | 0 | 31418 | 13.0000 | NaN | S |
| 347 | 348 | 1 | 3 | Davison, Mrs. Thomas Henry (Mary E Finck) | female | NaN | 1 | 0 | 386525 | 16.1000 | NaN | S |
| 189 | 190 | 0 | 3 | Turcin, Mr. Stjepan | male | 36.0 | 0 | 0 | 349247 | 7.8958 | NaN | S |
| 91 | 92 | 0 | 3 | Andreasson, Mr. Paul Edvin | male | 20.0 | 0 | 0 | 347466 | 7.8542 | NaN | S |
| 543 | 544 | 1 | 2 | Beane, Mr. Edward | male | 32.0 | 1 | 0 | 2908 | 26.0000 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||